---
title: "Using RAG Evaluators"
description: "Evaluating RAG Applications"
---

On this page, you will learn how to configure and use RAG evaluators for your custom-built application. RAG evaluators are unique because they need intermediate outputs and internal variables (such as context) to run.

By the end of this tutorial, you will know how to write a RAG application that saves the required internal variables and how to configure the evaluator in Agenta to use these in evaluation.

:::info
Access to internal variables (`internals`) and intermediate outputs (`traces` or `inline traces`) requires version `0.20.0+` of the Python SDK.
:::
:::info
Access to RAG Evaluators and `view trace` in the Web UI requires Agenta Cloud and is not available in Agenta OSS.
:::

## Developing the RAG Application

Let's consider a simple RAG Application (the source code can be found [here](https://github.com/Agenta-AI/agenta/tree/main/examples/rag_applications/mflix)).

The application fetches `count` movies on the topic of `topic` within the genre of `genre`, then generates an initial summary from those movies, and finally summarizes the report twice in a row, arbitrarily.

```python
async def rag(topic: str, genre: str, count: int = 5):
    result = await retriever(topic, genre, count)
    result = await reporter(topic, genre, count, result["movies"])
    result = await summarizer(topic, genre, result["report"])
    result = await summarizer(topic, genre, result["report"])

    return result["report"]
```

The retriever simply performs semantic search on the formatted prompt fetched from Agenta.

```python
async def retriever(topic: str, genre: str, count: int) -> dict:
    prompt = ag.config.retriever_prompt.format(topic=topic, genre=genre)
    topk = count * ag.config.retriever_multiplier

    query = embed(prompt)
    result = search(query["embedding"], topk)

    movies = [
        f"{movie['title']} ({movie['year']}) in {movie['genres']}: {movie['plot']}"
        for movie in result
    ]

    return {"movies": movies}
```

The reporter processes the set of movies retrieved in the previous step and generates a report.

```python
async def reporter(topic: str, genre: str, count: int, movies: dict) -> dict:
    context = ag.config.generator_context_prompt.format(movies="\n".join(movies))
    instructions = ag.config.generator_instructions_prompt.format(
        count=count, topic=topic, genre=genre
    )

    prompts = {"system": context, "user": instructions}
    opts = {
        "model": ag.config.generator_model,
        "temperature": ag.config.generator_temperature,
    }

    result = await chat(prompts, opts)

    report = result["message"]

    return {"report": report}
```

The summarizer finally improves on the reporter's output, in two passes.

```python
async def summarizer(topic: str, genre: str, report: dict) -> dict:
    context = ag.config.summarizer_context_prompt
    instructions = ag.config.summarizer_instructions_prompt.format(
        topic=topic, genre=genre, report=report
    )

    prompts = {"system": context, "user": instructions}
    opts = {
        "model": ag.config.summarizer_model,
        "temperature": ag.config.summarizer_temperature,
    }

    result = await chat(prompts, opts)

    report = result["message"]

    return {"report": report}
```

## Testing the RAG Avaluation

Upon instrumenting and serving the RAG Application with Agenta, you should see the output of the second summarizer.

<img className="dark:hidden" src="/images/basic_guides/rag_output_light.png" />
<img className="hidden dark:block" src="/images/basic_guides/rag_output_dark.png" />

Upon clicking on `view trace`, you should be able to see the `inputs` and `outputs` for each instrumented stage in the RAG workflow.

<img className="dark:hidden" src="/images/basic_guides/rag_initial_trace_light.png" />
<img className="hidden dark:block" src="/images/basic_guides/rag_initial_trace_dark.png" />

Now that we have a RAG Application, let's create a RAG evaluator.

## Creating the RAG Evaluator

On the evlauators page, click on `RAG Faithfulness` for example.

<img className="dark:hidden" src="/images/basic_guides/rag_initial_faithfulness_light.png" />
<img className="hidden dark:block" src="/images/basic_guides/rag_initial_faithfulness_dark.png" />
RAG evaluators, based on [RAGAS](https://docs.ragas.io/) are different from other evaluators in Agenta in that they
often require internal variables. For instance,
[Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html) and [Context
Relevancy](https://docs.ragas.io/en/stable/concepts/metrics/context_precision.html) both require `question`, `answer`,
and `contexts`.

From the trace we saw before, we could say that `answer` maps to the second rag summarizer output report, denoted by `rag.summarizer[1].outputs.report`. Similarly, the `contexts` map to the rag retriever output movies, denoted by `rag.retriever.outputs.movies`.

However, the `question`, which corresponds to the formatted prompt sent to the retriever (or the reporter) is not part of the inputs or the outputs of any stage in the workflow.

Next, let's see how and where you to find that variable.

## Updating the RAG Application

In Agenta, when instrumenting stages of a workflow, there is a utility to add internal variables, called `internals`, to the stage's span.

```python
async def retriever(topic: str, genre: str, count: int) -> dict:
    prompt = ag.config.retriever_prompt.format(topic=topic, genre=genre)
    topk = count * ag.config.retriever_multiplier

    ag.tracing.store_internals({"prompt": prompt})

    query = embed(prompt)
    result = search(query["embedding"], topk)

    movies = [
        f"{movie['title']} ({movie['year']}) in {movie['genres']}: {movie['plot']}"
        for movie in result
    ]

    return {"movies": movies}
```

The utility `ag.tracing.store_internals(...)`, as shown in the snippet above.

Upon serving the RAG Application again, the trace should now contain the formatted prompt

<img className="dark:hidden" src="/images/basic_guides/rag_final_trace_light.png" />
<img className="hidden dark:block" src="/images/basic_guides/rag_final_trace_dark.png" />

We are ready to resume the configuration of our RAG Evaluator.

## Configuring the RAG Evaluator

Back to the evaluators page, we can now map the `question` to the rag retriever internal prompt, denoted by `rag.retriever.internals.prompt`, as shown below.

<img className="dark:hidden" src="/images/basic_guides/rag_final_faithfulness_light.png" />
<img className="hidden dark:block" src="/images/basic_guides/rag_final_faithfulness_dark.png" />

Once the RAG Evaluator is set up, you are ready to go.

## Running the evaluation

You can now run evaluations on RAG Applications based on RAG Evaluators.

<img className="dark:hidden" src="/images/basic_guides/rag_evaluation_light.png" />
<img className="hidden dark:block" src="/images/basic_guides/rag_evaluation_dark.png" />
